Streaming the world of Horizon 


Decima Asset Streaming System 


Introduction 


Killzone Loading System 


e Traditional level- and section-based loading 

e Loading screen while loading initial sections assets 
e Corridor sections to unload old/load new sections 
e Corridors were mostly one-way 

e Could not load content around player dynamically 
e File packing caused long iteration times for artists 


Horizon Streaming System Design Goals 


No loading screens except for startup and fast travel 
No corridors, content should stream organically 
Continuous loading of content around player 

Faster iteration by not packing data 


Decima Asset Structure 


CoreText format 


e All objects defined in custom text format 


Generated with in-house editor, Maya 

Object types map to C++ classes 

Objects have attributes and links to other objects 
Horizon content: 

— 300,000 files, 

— 16 million objects 

— 20 million links 


Enemies = [] 
FriendlyFactionsExludedForLOFChecks = [] 
ClaimGroup = "or 


DestructibilityPart 


{ 


! Name "DestructibilityPart" 
!UUID "602c83dc-"7cb0-4859-bb8a-eff4aa328e98" 
Enabled = "True" 
Health = "100" 
DamageSponge = "False" 
DamageToEntityMultiplier = "or 
ClampCoreDamageToPartHealth = "False" 
LimitMaxCoreHealth = "False" 
BoneName amm 
LocalMatrix = 
{ 
"(100 0)" 
"(0 1 0 0)" 
"(0010)" 
"(000 1)" 
} 
RandomLocalMatrix = <RandomMatrixResource> 
InitialState = <> 
TagProperties = [] 
General 
{ 
Name = "DestructibilityPart" 
} 
} 
DestructibilityResource 
{ 
!Name "DestructibilityResource" 
!UUID "9215641a-740e-4267-87a0-471a5804a4be" 


General 
{ 
Name = "DestructibilityResource" 
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The CoreText format 
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Kill Childs ‹ 
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Life Time 


Content graphs 


e Any set of linked files represent a graph 

e There are no cycles in the content graph 

e Graphs must always be loaded fully 

e Many subgraphs partially overlap 

e Subgraphs are separated by Streaming Links 


Content Graphs 


[Same 


levels/game 


c» 


levels/worlds/world/leveldata 


Object Collection 


levels/worlds/world/tiles/tile x04 y-04/loddefault тат. 5 lighting 


Target Object: spiderwebs 5001 с002 prefabinstance (Prefabinstance) 


levels/worlds/world/tiles/tile x04. y-04/layers/geometry/mq]. 5. spiderwebs 


HintTrigger = «/levels/worlds/world/tiles/tile x04 y-04/layers/gameplay/mg1 5 streamingtriggers:t mqi 5 lighting hint» 
ActivateTrigger = «/levels/worlds/world/tiles/tile x04 y-04/layers/gameplay/mg1 5 streamingtriggers:t mgl 5 lighting activate> 
ObjectCollection = «/levels/worlds/world/tiles/tile x04 y-04/loddefault mg1 5 lighting:MQO1 5 Lighting» 

ObjectCollections = [] 


HintedFact = <> 


Content Graphs 


Conversion 


Conversion System 


Process content graph recursively 

Translate each CoreText file into binary data 
Optimize content for runtime usage 
Generate loading hints, runtime code libraries 


Asset Conversion 


Converter 


CoreDebug Dependencies 


Game data, shipped on disk Debug data Conversion data 


Post-Conversion 


e For Horizon, 300,000 CoreText files generated 
— 189,000 Core files, 20GiB 
— 30,000 CoreStream files, 15GiB 

• 1.2GiB of localized data per language 


File Loading 


Overview 


e To load object, its file must be loaded 

e The objects in that file often link to other objects 
e Loading and initializing object graphs is depth-first 
e Any object (and any file) is only in memory once 


Core File Loading 


Objects go through many phases during loading: 

e Deserialization – create object and read attributes 
e Link resolving — set pointers to linked objects 

e Initialization — allow the object to execute init code 
e Activation – add to world, physics, other systems 


Load ordering 


e Ordering is important: 
— Any objects pointed to must always be initialized first 
— Processing is depth-first in graph order 

e Full graph must be known when loading assets 


Loading Process 


e Determine file graph 

e Remove files already loaded 

e Queue remaining files for async 1/0 (depth first) 

e Create job graph for object initialization (depth-first) 
e Deserialize files into objects 

e Runinitialization jobs for completed files 


Reference counting Core Files 


Files are reference counted 

When loading a subgraph, skip loaded files 
Instead, take reference to loaded files 

Files are unloaded automatically on last release 
No object is ever loaded more than once 


Reference Counting Files 


e Let’s start loading a character: 


NPC_ Blond 


e This character needs these files: 


NPC_ Blond HeadModel HeadMesh HeadTexture 


e No files are loaded at this point 


Reference Counting Files 


e We load these files and get this graph: 


МРС Blond (1) шм BModel (1) вы BMesh(1) рвы Texture (1) 


e Each file currently has a reference count of 1 


Reference Counting Files 


e We wish to load a second character: 


NPC_Grey 


e This character consists of these parts: 


Reference Counting Files 


e We know that two of these files are already loaded: 


NPC Grey GModel 


e So we load only these files: 


NPC Grey GModel 


Reference Counting Files 


e This leads to the new graph: 
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е Now BMesh has two references, and is shared 


Reference Counting Files 


e We unload the first character: 


NPC Blond (0) BModel (0) 
МРС Grey (1) msg Gmodel(1) | 


e BMesh now has one reference 


BMesh (1) Бы  lexture (1) 


Reference Counting Files 


e We unload the second character: 


NPC_Blond (0) BModel (0) 
NPC_Grey (0) Gmodel (0) 


e All files have reference count of zero and are unloaded 


Texture (0) 


Prefetching 


Prefetch Files 


e Represent the file hierarchy of the entire game 
e Generated during conversion 

e Simple to determine file graph for any given file 
e Very little data, "20MiB on disk/in memory 


Prefetch Files 


Assume we're working with these file graphs: 
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This is the Prefetch list for these files: 
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Graph A 


e When traversing for A, we get sequence: 
e Which corresponds with the depth-first graph: 


EXEK г > A 


Graph B 


e When traversing for B, we get sequence: 
5|9|7|8|6|В 
e Which corresponds with the depth-first graph: 


| Texture -5 fam Material - 9 hai Mesh - 8 / Model - 6 Баш Character - B 


Streaming Strategies 


Streaming Links 


e Roots of subgraphs 
— Tiles, Characters, Weapons 


• Loaded on demand 
e Are loaded by Streaming Strategies 
e Often overlap with other loaded graphs 


Streaming Strategies 


Intermediary between game and streaming system 
Determine when to load/unload subgraphs 
Customizable logic for different domains 
Evaluated once per frame to queue load/unload 


AlwaysLoaded Streaming Strategy 


Responsible for loading initial game content 
— System assets 

— World data 

— Aloy 


e Loaded at startup, never unloaded 


TileBased Streaming Strategy 


e Loads/unloads tiles around player 


Four tile resolutions: 
— 9 High, full resolution tiles 


— 9 Medium, medium res geometry and physics mesh 
— 9 Low, low res baked geometry 


— 12 Very low, always loaded 


“Active Scenes 
«Inactive Scenes 
*Marker 

isActive entity 


AreaBased Streaming Strategy 


e Primary hull around player for loading hint 
e Secondary hull around player for activation hint 
e Used for quests, scenes, other dynamic encounters 
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ProgramBased Streaming Strategy 


e Evaluates custom programs created by designers 
e Used for complex streaming scenarios 
e Uses player and world facts for dynamic content 


Streaming Strategy Conclusions 


e Well maintainable streaming system 
e Can easily be extended by adding strategies 
e Designers can implement streaming logic 


Packaging 


Packaging 


e We had ^220,000 files for final package 

e Exceeds limit of PS4 packages 

e Opening and closing so many files is costly 
e Needed a better way of shipping content 


PackFiles 


e Recombine files into very small number of large files 
e Keep files open at all times, 

e Keep file directory in memory 

e Optimize file order for most linear access 

• Compress files to fit all content on disk 


Low File Count 


> Gold Master > 


Share with v New folder 


^ 


Name Date modified Type Size 


^ 


4 Movies 
de sce_module 
di sce sys 


А eboot.bin 


File folder 
2017 9:10 AM File folder 
/28/2017 9:10 AM File folder 
2017 12:05 PM VLC media file (bi... 135,656 KB 
/23/2017 12:32 AM PRX File 168,405 KB 
/23/2017 12:26 PM МІС media file (bi... 14,143,318 KB 


|. fullgame.prx 


А Initial.bin 


Å Initial English.bin 1/2 VLC media file (bi... 572,197 KB 
Å Initial Japanese.bin 1/23/2017 12:31PM МІС media file (bi... 595,918 KB 
Å Remainder.bin 1/23/2017 12:46 PM МІС mediafile(bi.. | 13,219,986 KB 
^5 Remainder English.bin 1/23/2017 12:53PM МІС media file (bi... 652,209 KB 


& Remainder Japanese.bin 1/23/2017 12:57 PM МІС media file (bi... 634,920 KB 


Files open for duration of game 


Keeping Files Open 


e On start, open files and read file directories 


In-memory file directory 

Only use sceKernelPreadv, 

No calls to open, Iseek, or close 
Dramatically improves performance 


Optimized file order 


e Scan content graph to discover all file links 

e Split files between initial/remainder/localized groups 
e Group files in subgraphs based on streaming links 

e Order files in groups on graph order, depth-first 


Compression 


e Write sorted, uncompressed files to 256KiB blocks 
• Compress blocks 

e Write compressed blocks sequentially 

e Index maps from logical to physical offset 


Sorting File Graphs 


Packing Files 


Compressing Blocks 


Block Mapping 


Patching 


e A patch file is a regular PackFile 

e Contains added/modified files since Gold Master 
e Index is overlay on Gold Master file index 

e Lookup finds patched file entry 

e Simple code, no delta compression 

e Current 1.30 patch is only “98МІВ of content 


Development 


No PackFiles in development 

Instead, files are loaded from host PC via socket 
Host PC keeps files in memory 

PS4 HDD only used for testing packages 


|/О Performance Results 


Killzone Large Files ~3000 ~3000 ~50MiB/sec 
Horizon MemCache ~200,000 ~200,000 ^90MiB/sec 
Horizon Shipped Package 4 ~200,000 ~60MiB/sec 


e Small files improved iteration times enormously 
e No need to do any packaging during production 
e Packing files only for shipping works great 


Memory Management 


Memory Layout 


High-level view of memory layout 


Memory Configuration mi 


E3 Simulate Submission Build 
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Streaming Buffer 
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Flexible Memory |fullgame.prx ш 


Available for assets: 
in Submission (estimated): 2054.81 MiB 


Current amount of loaded assets: 1186.23 Mit 


Heap 


e Fixed size, “800МІВ, Onion 
e Managed by DLMalloc 


Memory Configuration O 


£3 Simulate Submission Build 
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Streaming Buffer 
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Available for assets: 
in Submission (estimated): 2054.81 MiB 


Current amount of loaded assets: 1186.23 Mit 


GNM Video Pool 


e Fixed size, ~SOOMIB, Garlic 
e Render targets, contexts, compute shaders 


Memory Configuration mi 


E3 Simulate Submission Build 
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Streaming Buffer 


0 MiB 100 MiB 200 MiB 300 MiB 400 MiB 448 MiB 
| | | 
Flexible Memory |fullgame.prx ш 


Available for assets: 
in Submission (estimated): 2054.81 MiB 


Current amount of loaded assets: 1186.23 MiB 


GNM Shared Pool 


e Fixed size, “300МІВ, Garlic 
e Subsystem-specific VRAM data 


Memory Configuration mi 


E3 Simulate Submission Build 
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Streaming Buffer 


0 MiB 100 MiB 200 MiB 300 MiB 400 MiB 448 MiB 
| | | 


Flexible Memory |fullgame.prx ш 


Available for assets: 
in Submission (estimated): 2054.81 MiB 


Current amount of loaded assets: 1186.23 MiB 


RenderData Pool 


e Variable size, Garlic 
e Contains textures, meshes, shaders 


Memory Configuration mi 


E3 Simulate Submission Build 
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Streaming Buffer 
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Flexible Memory |fullgame.prx =a 


Available for assets: 
in Submission (estimated): 2054.81 MiB 


Current amount of loaded assets: 1186.23 MiB 


AssetMemory Pool 


e Variable size, Onion 
e Contains object data 


Memory Configuration m 


£3 Simulate Submission Build 


0 GiB 1 GiB 2 GiB 3 GiB 4GiB 5.42 GiB 
| | | | | 


B 
МИ о сол Heap Onion) oo Poo, [GNM sha.. Renderata ten ari Elise: Memor one) | 


Streaming Buffer 


0 MiB 100 MiB 200 MiB 300 MiB 400 MiB 448 MiB 
| | | 


Flexible Memory |fullgame.prx ш 


Available for assets: 
in Submission (estimated): 2054.81 MiB 


Current amount of loaded assets: 1186.23 MiB 


Flexible Memory 


e ELF, PRX, Stacks 
e No application data 


Memory Configuration mi 


E3 Simulate Submission Build 


0 GiB 1 GiB 2 GiB 3 GiB 4 GiB 5 GiB 5.42 GiB 
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Streaming Buffer 


U MiB 100 MiB 200 МІВ 300 MiB 400 МІВ 448 МІВ 
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| | p< 
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Available for assets; 
in Submission (estimated): 2054.81 MiB 


Current amount of loaded assets: 1186.23 Mit 


AssetMemory and RenderData 


e Share physical memory, not virtual memory 

e All physical memory initially allocated to RenderData 
e AssetMemory requests/returns physical memory 

e RenderData provides physical memory on demand 


RenderData Pool 


RenderData Pool 


e Manages VRAM 
e Contains textures, meshes, shaders 
e Has static and streaming assets 
— Static: always loaded when objects are loaded 


— Streaming: Optional mesh LODs/texture MIPs 
e Defragmented continuously 


Contiguous mapped virtual memory range 
2MIB page size 

Maintains block list (free/used) 

Free blocks moved to end of range 
Map/unmap physical memory at end of range 


Layout 


RenderData Pool View 
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Defragmentation 


e Defragmentation has 3 phases: 
1. Frame M: Copy used blocks down to fill free space 
2. Frame M: Move free blocks up to end of range 
3. Frame N: Free copied blocks, then back to (1) 

e Used blocks must linger 1 frame, may be in use 


e Next frame new address is used, old block freed 


Defragmentation Details 


e Runs at start of every frame 

e CPU determines which blocks to copy 

e Maximum of 16MiB copied per frame 

e Determines new address for copied blocks 

e Schedules copy commands as Async Compute jobs 
e After copy, updates handles with new addresses 


Defragmentation 
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Move used block down 
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Move free block up 
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Move free block up 
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Asset Allocator 


Asset Allocator 


Contains objects created through streaming 
Layered allocator 

Manages virtual memory ranges 

Uses physical memory requested from RenderData 


Asset Allocator Structure 


Block Allocator 
Virtual memory ranges, multiple of 128KiB 
Physical memory, multiple of page size 


SmallBlock Allocator LargeBlock Allocator Linear Allocator 
Sizes <= 32KiB Sizes > 32KiB Contiguous allocation 
Uses 2МІВ blocks Any size blocks Increments of 2MiB 


Manages 1GiB regions of virtual memory 
Splits regions into 128KiB blocks 

Each block represented by 64B header 
Header contains pointer to SubAllocator 
Headers contiguous at start of region 
512KiB overhead per 1GiB 


Block Allocator 


Preventing fragmentation 


e Prevent fragmentation of virtual memory: 
— Large virtual memory allocations (128KiB increments) 
— Don't mix unrelated allocations (lifetime/size) 
e Prevent fragmentation of physical memory: 
— Combine equal-size blocks 
— Combine blocks with same lifetime 


— Commit physical memory in 16KiB increments 


Allocation 


e SubAllocator requests block of size N 
e BlockAllocator: 
— Ensures enough physical memory is available 
— Allocates align_up(N, 128KiB) virtual address range 


— Maps align_up(N, 16KiB) physical memory to range 
— Sets pointer to SubAllocator in block header 


Deallocation 


e Pointer resolved to SubAllocator 
• SubAllocator: 

— Updates own bookkeeping for block 

— If block empty, returns it to BlockAllocator 
e BlockAllocator: 

— Unmaps physical memory 

— Marks virtual range as free 


— Updates physical free size 


Obtaining physical memory 


e BlockAllocator: 


— Requests 64MiB from RenderData 
e RenderData: 


— Unstreams low prio LODs/MIPs 
— Defragments free space to end of range 
— Unmaps 64MIB and shrinks RenderData 
e Available (unmapped) memory grown by 64MiB 


Releasing physical memory 


e BlockAllocator: 


- If > 64MiB physical memory free, notifies RenderData 
e RenderData Pool: 


— Maps 64MIB physical memory at end of pool 
— Grows pool size 


— Starts streaming LODs/MIPs into available memory 


SmallBlockAllocator 


Manages allocations <= 32KiB 

Buckets per size class 

Each bucket is linked list of 2MiB blocks 

Each block is split into 2MiB/(size class) entries 
Free list maintained in empty entries 


LargeBlockAllocator 


e Single allocations > 32KiB 

e Allocates align_up(N, 16KiB) from BlockAllocator 
BlockAllocator allocates align up(N, 128KiB) range 
e BlockAllocator maps align up(N, 16KiB) memory 

e Average overhead: 


— 8KiB physical 
— 64KiB virtual 


LinearAllocator 


Collects multiple >32KiB allocations 

Maintains multiple >= 2MiB blocks 

Only used for allocations with identical lifetime 
Memory freed only after all allocations freed 
Fast alloc, only increment pointer in large block 
Fast free, release all blocks when done 

Low overhead for alignment/bookkeeping 


Memory Management Conclusions 


e Shared memory between Onion/Garlic works well 
e Map/unmap overhead is low 
e Allows for dynamic budgets 


Defragmentation: 
— Expensive and complex 
— But almost no waste 


CPU Scheduling 


CPU Scheduling 


e Threads managed by job scheduler 

e Two job types: 
— Frame jobs (must complete each frame) 
— Non-frame jobs (long-running jobs) 

e Three priorities for each job type 

e Carefully selected thread affinities 


Scheduling and Thread Affinity Model 
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CPU Scheduling Conclusions 


e Full core occupancy achieved 

e Clean separation between frame and non-frame jobs 
e Non-frame jobs run in idle time of frame jobs 

e Very few custom threads due to flexible system 

e Better guarantees about completion and deadlines 


Future Plans 


Future plans 


e Load object graphs, not file graphs 


e Use key-value store for object storage/retrieval 
e Hybrid VRAM solution: 


— Defragmentation for small allocations 


— Virtual memory for large allocations 


Questions? 


М 


